The Memory logP Model of Local Communication
نویسندگان
چکیده
1 Abstract—Data movement across a memory hierarchy can severely impact application execution time. For example, on the fast interconnect of the Origin 2000 three-and four-fold increases in communication cost for small message transmissions (~1K) stored non-contiguously are not uncommon. Simple, accurate predictions of communication time in hierarchical memories will identify bottlenecks in communication performance during algorithm design. We present a simple and useful model of point-to-point memory communication inspired by LogP to predict and analyze the latency of memory copy, pack and unpack for varying memory access patterns. We use our model to isolate the contributions of hardware, middleware, and software to data transfers on Intel-and MIPS-based platforms.
منابع مشابه
Models and Resource Metrics for Parallel and Distributed Computation
This paper presents a framework of using resource metrics to characterize the various models of parallel computation. Our framework reeects the approach of recent models to abstract architectural details into several generic parameters, which we call resource metrics. We examine the diierent resource metrics chosen by diierent parallel models, categorizing the models into four classes: the basi...
متن کاملModels and Resource Metrics for Parallel and Distributed Computationt
This paper presents a framework of using resource metrics to characterize the various models of parallel computation. Our framework reflects the approach of recent models to abstract architectural details into several generic parameters, which we call resource metrics. We examine the different resource metrics chosen by different parallel models, categorizing the models into four classes: the b...
متن کاملHiding Communication Costs in Bandwidth-Limited Parallel FFT Computation
This paper presents a novel computation schedule for FFT-type computations on a bandwidth-limited parallel computer. Using P processors, we are able to process an n-input FFT graph in the optimal time of n logn P by carefully interleaving interprocessor communication steps with local computation. Our algorithm is suitable for both shared-memory and distributed memory machines and is analyzed in...
متن کاملFurther Results with Algorithmic Skeletons for the CLUMPS Model of Parallel Computation
The CLUMPS (Campbell's Lenient, Uniied Model of Parallel Systems) model of parallel computation is composed of an architectural model with an associated cost model. The architectural model employs a multi-level memory hierarchy, so requires general locality of communication (communication between close processors). The multi-level memory hierarchy is reeected in the cost model which is based on...
متن کاملProgramming Data-parallel { Executing Process-parallel
Most theoretical work is based on the PRAM-model which has a block of shared memory and executes in a synchronous lock-step mode. Real hardware usually executes asynchronously and uses local memory and message passing. The recent LogP-model reeects these architectural properties. We show that for a practically important subclass of PRAM-programs it is possible to transform them into LogP-progra...
متن کامل